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Robots that imitate humans 


Cynthia Breazeal and Brian Scassellati 


Thestudy of social leaming in robotics has been motivated by both scientific 
interest in the leaming process and practical desires to produce machines that 
are useful, flexible, and easy to use. In this review, we introduce the social and 
task-oriented aspects of robot imitation. We focus on methodologies for 
addressing two fundamental problems. First, how does the robot know what 
to imitate? And second, how does the robot map that perception onto its own 
action repertoire to replicate it? In the future, programming humanoid robots to 
perform new tasks might be as simple as showing them. 
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Thestudy of the mechanisms that enablean 
individual to acquireinformation or skills from 
another individual has been a seminal topicin 
many areas of cognitive science. For example, 
ethologists attempt to understand how bees 
communicate the location of food sources, to describe 
how successive generations of blue-tits learn to 
open milk cans, and to categorize the spread of tool 
usein chimpanzee troops. Developmental 
psychologists study the emergence of social learning 
mechanisms in human infants from the very early 
(but simple) imitative responses of the newborn [1] 
tothe complex replication of task goals that 
toddlers demonstrate [2]. 

Research in robotics has focused on social 
learning for many reasons. Commercial interest in 
building robots that can be used by ordinary people 
in their homes, their workplaces, and in public 
spaces such as hospitals and museums, invoke social 
learning asa mechanism for allowing users to 
customize systems to particular environments or 
user preferences. Research in artificial intelligence 
has focused on social learning as a possible means 
for building machines that can acquirenew 
knowledge autonomously, and become increasingly 
more complex and capable without requiring 


additional effort from human designers. 
Other researchers implement models of social 
behavior in machines to gain a deeper 
understanding of social learningin animals 
(including humans). 


Differences between the study of social learning in 
animals and machines 

The methods for studying social learningin artificial 
systems differ significantly from methods used to 
study social learning in biological systems. When 
studying animals, researchers attempt to determine 
theminimal set of capabilities required to produce an 
observed behavior. Precise taxonomies of thetypes of 
required skill have been developed; however, none of 
these is universally accepted (see Box 1). Although 
these descriptions often focus on cognitive skills, 
they donot completely capture the ways in which 
these skills can be constructed or combined to produce 
the observed behavior. 

Whereas biological studies tend to be descriptive, 
studies of social learning in artificial systems are 
primarily generative; researchers attempt to 
construct a desired behavior from a minimal set of 
capabilities. These studies often useimprecise 
definitions of the external behavior (often using 
the word imitation to mean any type of social 
learning), but can precisely specify the underlying 
mechanisms of the system (see Box 2). Although 
these methodological differences do produce 
terminology problems between these related 
disciplines, on the whole, theliteratureon social 
learningin animals is a very accessible source of 
inspiration for robots, both physical and simulated 
(see Box 3). 
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Box L Taxonomies of social leaming 
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There has been little consensus on operational definitions for many of 
the behavioral terms used to describe social learning, although many 
taxonomies have been developed [a-c]. The following incomplete set of 
simplified definitions (adapted from [d]) is provided as an example of the 
range of behaviors considered under social learning. 

LetA and B representtwo individuals or sub-populations of 
individuals: 


Imitation: A learns a behavior performed by B thatis novel to A's 
behavioral repertoire. A is capable of performing the behavior in the 
absence of B. 

Goal emulation: after observing B’s actions, A produces the same 
end product as B. The form of A's behavior differs from B's. 

Stimulus enhancement. A's attention is drawn to an object or location 
as aresult of B’s behavior. 

Social support. A is more likely to learn B’s behavior because B's 
performance produces a similar motivational state in A. 

Exposure: as a result of A's association with B, both are exposed to 
comparable environments and thus acquire comparable behaviors. 

Social facilitation: an innate behavior is released in A as a result of B’s 
performance. 


Other attempts at categorizing types of social behavior have focused 
on the distinction between the observable behavior and the underlying 
behavioral goal [e]. For example, suppose a robot were to observe a 


Many different underlying mechanisms can produce the 


same observable behavior 


Thereare many ways in which a robot can bemadeto 
replicate the movement of a human. Animatronic 
devices (such as those used in amusement parks) 
continuously replay movements that have been 
recorded either by manually putting the machineinto 
a sequence of postures or by using devices that record 


Box 2. Tems used to describe social leaming in robotics 


person picking up a paintbrush and applying paint to a wall. The robot 
could imitate the surface form of this event by moving its arm through a 
similar trajectory, perhaps even encountering a wall or a brush along the 
way. However, the underlying organizational structure of applying paint 
to a wall involves recognizing the intent of the action as well as the 
usefulness of the tool in accomplishing the goal. Meltzoff [f] has noted 
that by 18 months of agehuman children are capable of responding to 
both the surface form and the intended action. 
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thejoint angles of ahuman actor. Although these 
machines can perform very high fidelity playback, 
they arenon-interactive; they neither respond to 
changes in their environment nor dothey adapt to 
new situations. 

Other research has focused on the development of 
robots that can learn to perform tasks by observing a 
person perform that action. This technique, often 


Imitative behavior refers to a robot's ability to replicate the movement of 
a demonstrator [a]. This ability can either be learned or specified a priori. 
For instance, in leaming by imitation [b-d], the robotis given the ability 
to engage in imitative behavior, which serves as amechanism that 
reinforces further learning and understanding. When the ability to 
imitate is learned, called leaming to imitate [e-g], the robot learns 

how to solve the ‘correspondence problem’ through experience. In 
learning by demonstration [h-j], a new task is acquired by the robot, 
butthis may or may not involve imitative behavior. In the case where 

it does not, called task-level imitation, the robot learns how to perform 
the physical task of the demonstrator (such as an assembly task [k,!]) 
withoutimitating the behaviors of the demonstrator. When given 
knowledge of the task goal, robots have learned to perform a physical 
task (e.g. learning the game of ‘ball in cup’ [m], or a tennis forehand [n]) 
by making use of both the demonstrator’s movement and that of the 
object. Finally, the ability of a robot to learn a novel task, where it 
acquires both the goal and the manner of achieving itfrom 
demonstration, is referred to as true imitation. 
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Box 3. Robotic platforms: physical and simulated 
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The robotic community has explored the topic of imitation on a wide 
assortment of platforms, including physical robots and sophisticated 
physics-based simulations. 

Humanoid robots can engage in physical and social imitation tasks 
and serve as extremely compelling demonstrations. They are also 
expensive, challenging to build, and require continual maintenance. 
Some systems are primarily upper torsos [a-d], some are full-body 
systems [e], some are only a head with a vision system [f], and some 
have an expressive face [g]. Although many other full-body humanoid 
robots have been constructed (e.g. Honda’s child sized Asimo and Sony’s 
knee-height SDR-4X) they have not yet been used in social learning 
systems. Simpler robots, such as small mobile robots [h,i] or robot dolls 
[j], have also been used to explore the social dimension of imitation. 
Robotic arms are popular for exploring learning how to perform physical 
tasks by demonstration [k-o]. 

Physics-based 3-D rigid-body simulations of humanoid robots area 
popular alternative, allowing researchers to implement and evaluate 
systems quickly. Simulations produce results that are more easily 
replicated, as the software can often be shared among researchers. 

The primary difficulty with simulations is in transferring results from 
simulation to physical robots. Solutions that tend to workeven in 
complex simulations often fail in the real world because of the inherent 
lower fidelity of simulations. A few collaborations exist allowing 
researchers who work mostly with simulated humanoids to test their 
theories and implementations on actual robots [p,q]. 
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called ‘learning from demonstration’, has been 
reviewed in detail by Schaal [3]. Early explorations 
did not focus on perceiving the movement of the 


mn $ 


Fig. 1. DB, afull-torso humanoid robot offered commercially by Sarcos, which can learn to play air 
hockey by observing the movements that a human player makes. The robot's visual system attends to 
the green puckand the positions of the human player's red paddle. By playing against experienced 
opponents, the robot learns to position its own paddle to defend its goal successfully and to shoot at 


the opponent's goal. 
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human demonstrator, but rather focused on 
observing the effects of those movements on objects 
in the environment (such as stacking blocks [4] 
or peg insertion [5]). In other work, the robot 
observes the human's performance as well, using 
both object and human movement information to 
estimate a control policy for the desired task. 
Providing the robot with knowledge of the goal 

(in the form of an evaluation function) allows the 
robot tofurther improve its performance through 
trial and error, for instance, for a ‘ball-in-cup’task [6] 
or thetask of playing air hockey (Fig. 1). Atkeson 
and Schaal [7] demonstrated that far fewer 
real-world practicetrials were needed if the robot 
could simulate its experience using a predictive 
forward model! for a pendulum-swing-up task. 
Although systems that learn from demonstration 
have been programmed to perform impressive 
feats, the systems are limited by the fact that 
information flows only in one direction: from 
human to machine. 


Imitation and social interaction in robots 

Studies of social learningin robotic systems have 
looked at a wide range of learning situations and 
techniques. Initial studies of social learningin 
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Box 4. Movement primitives 


‘Movement primitives’ (also referred to as perceptual-motor primitives, basis 
behaviors, motor schemas, macro actions, or motor programs [a,b]) are a compact 
representation of action sequences for generalized movements that accomplish a 
goal. From acomputational perspective, amovement primitive can be formalized 
as a ‘control policy’, encoded using afew parameters in the form of a parameterized 
motor controller for achieving a particular task[c,d]. Examples of movement 
primitives include behaviors such as ‘walking’, ‘grasping’, or ‘reaching’, and they 
are often characterized as discrete straight-line movements, continuous oscillatory 
movements, or postures [e]. The primitives of a system serve as the basis set of 
motor programs (a movement ‘vocabulary’), which are sufficient, through 
combination operators, for generating the robot's entire movement repertoire. 
The primitives allow positions and trajectories to be represented with fewer 
parameters, although with a corresponding loss of granularity and/or generality. 
As aresult, more recent work has focused on using imitation as a way of acquiring 
new primitives (as new sequences or combinations of existing primitives) that can 
be added to the repertoire [f,g]. 
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robotics focused on allowing one robot tolearn to 
navigate through mazes [8] or an unknown 
landscape [9] by using simple perception 
(proximity and infrared sensors) to follow another 
robot that was adept at maneuvering in the 
environment. Other work in social learning for 
autonomous robots addressed learninginter-pers. 
commun. protocols between similar robots, 
between robots with similar morphology but which 
differ in scale[10], and with a human instructor [11]. 
Other approaches have looked at expressive 
imitation involving facial displays and head 
gestures [12-14]. 

Although theindividual tasks in each of these 
studies varied considerably, each of these studies 
looked at social interaction as a means to address two 
fundamental issues. First, how does the robot decide 
what toimitate? Second, how does the robot act upon 
that decision to performa similar action? F or 
simplicity, in thefollowing discussion welook only at 
systems that involve social learning between a 
human anda robot that has a similar physical body 
structuretoa human (see[15] for a discussion of the 
difficulties that arise when body structures are 
radically different). 
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How does a robot know what to imitate? 

When attempting toimitatea human, how does the 
robot determine what perceptual aspects arerelevant 
tothetask?Therobot needs todetect the demonstrator, 
observe his or her actions, and determine which are 
relevant tothetask, which are part of theinstructional 
process, and which arecircumstantial [16]. Thisisa 
challenging problem for perceptual systems and 
involves not only the ability to perceive human 
movement, but also the abilities todetermine saliency 
(i.e. what isimportant) and to direct attention. 


Perception of movement 

The visual perception of 3-D movement of humans or 
objects continues tobea difficult problem for robot 
vision systems. This problem can be avoided by using 
motion capture technologies, such as an externally 
worn exoskeleton that measures joint angle 

(e.g. a Sarcos SenSuit), or placing magneticmarkers 
on certain joints and tracking them (e.g. the FastTrak 
system) [17]. Other simplifications, such as marking 
relevant objects with magnetic tags or distinctive 
colors, are often used [4,5,7, 18,19]. 

More general solutions to the problem of 
perceiving human movement through vision have yet 
to be realized [20,21], but many researchers are 
turning totechniques such as hidden Markov 
models [22], or perceptual-motor primitives 
(see Box 4) [23,24] to provide basic information on 
how a human is movingina visual scene. These 
techniques combine task-based knowledge with 
predictive models in an attempt tolink expectations 
of what the scene should look like with sensory data. 
Although thesetechni ques can provide information 
on how a person is moving, subsequent extensive 
tuningtothe particular robot and environment are 
often necessary to produce usable data. 


Attention 

The problems of perception are closely tied to 

models of attention. Some attention models 
selectively direct computational resources to areas 
containing task-related information. They dothis 
either by using fixed criteria [23,25] (such as 

‘always look at red objects when trying to pick 
apples’) or by using adaptive models that modify 

the attentional process based on the robot's social 
context and internal state. F or example, the 
humanoid robot Cog (See Fig. 2) was biased to attend 
to objects with colors that matched skin tones when it 
was ‘lonely’, and to attend to objects that were 
brightly colored when ‘bored’ [26]. Another strategy is 
touseimitative behavior as an implicit attentional 
mechanism that allows theimitator tosharea 
similar perceptual state with the demonstrator [27,9]. 
This approach is used in the ‘Iearning-by-imitation’ 
paradigm, in which the ability toimitateis given 

a priori and acts as a mechanism for reinforcing 
further learning and understanding. Hence, as 
Demiris and Hayes put it, ‘thelearner isn’t imitating 
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Fig. 2. Cog, an upper-torso robot capable of mimicking arm gestures. Cog uses an attention system 
based on models of human visual attention to locate multiple objects of interest in the environment 
(such as the author's hand here), selects object trajectories that display animate characteristics 

(i.e. trajectories that display self-propelled motion) and that the human instructor is attending to 
(based on the instructor's head orientation), and attempts to map these trajectories onto the 
movement of its own arm. 


because it understands what the demonstrator is 
showing, but instead learns to understand because it 
isimitating’ [24]. For instance, those authors used 
this techniquetoteach a robot a control policy for 
how totraversea series of corridors by following 
another robot [8]. 

Shared attention, the ability to attend tothe 
demonstrator’s object of attention, has also been 
explored as a means for a robot to determine critical 
task elements [13]. Many machine vision systems 
have looked at the problems of identifying cues 
that indicate attention, such as pointing [28], 
head pose [29], or gaze direction [30]. H owever, 
only in the past few years has it become practical 
touse these systems in real time on robotic 
systems [31,32]. 


How does a robot know how to imitate? 

Oncea relevant action has been perceived, the 
robot must convert that perception intoa 
sequence of its own motor responses to achieve 
thesameresult. Nehaniv and Dautenhahn have 
termed this the correspondence problem [15]. 
Although it is possibleto specify the solution to 
the correspondence problema priori, thisis 
practical only in simplesystems that usethe 


Fig. 3. Adonis, a rich physics-based simulation of a humanoid upper torso, which has learned to 
dance the Macarena based on motion capture data from a human dancer. Adonis uses motion 
primitives to map the recorded movements of the human dancer to the range of possible motions that 
itis capable of performing. 
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learning-by-imitation paradigm described 
above. When the solution tothe correspondence 
problem is acquired through experience, more 
complex perceptions and actions can be 
accommodated, and this is then referred toas 
‘learning toimitate’. 


Representing perceived movement in motor-based 
terms 

Onestrategy to attempt to solve the correspondence 
problem is to represent thedemonstrator’s movement 
trajectory in thecoordinate frame of the imitator’s 
motor coordinates. This approach was explored by 
Billard and Schaal [33], who recorded human arm 
movement data using a Sarcos SenSuit, and then 
projected that data into an intrinsic frameof 
reference for a 41-degree-of-freedom humanoid 
simulation [34]. Another approach, the use of 
perceptual-motor primitives [35,36], is inspired 

by the discovery of ‘mirror neurons’in primates, 
which are active both when a goal-oriented action is 
observed and when the same action is performed 
[37-40]. Mataric adapted this idea toallowa 
simulated upper-torso humanoid robot tolearn to 
imitatea sequence of arm trajectories [23] (see Fig. 3 
and Box 4). 


Representing motor movements in task-based terms 
An alternative to converting perceptions into motor 
responses is torepresent theimitator’s motor acts in 
task space, where they can be compared directly with 
the observed trajectory. Predictive forward models 
have been proposed as a way torelate observed 
movement tothose motor acts that the robot can 
perform [19,24,41,42]. Their power has been 
demonstrated in model-based imitation learning: 
Atkeson and Schaal have shown how a forward 
model and a priori knowledge of the task goal can 

be used to acquire a task-level policy from 
reinforcement learning in very few trials [18]. 

They demonstrated an anthropomorphic robot 
learning how to perform a pole-balancing task ina 
singletrial, and a pendulum swing up task in three 
to four trials [18,19]. 

Demiris and Hayes [24] present a related 
techniquethat emphasizes the bi-directional 
interaction between perception and action, whereby 
movement recognition is directly accomplished 
by the movement-generating mechanisms. They 
call this ‘active imitation’to distinguish it from 
passive imitation (which follows a one-way 
percei ve-recognize-act sequence). To accomplish 
this, a forward model for a behavior is built directly 
intothe behavior module responsible to producing 
that movement. 


Conclusion 

| mitation-inspired mechanisms have played three 
dominant (and related) roles in robotics research to 
date. First, imitation can bean easy way to programa 
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Fig. 4. Robotais a robot doll currently under development at USC. 
Itis able to mimic a few simple gestures of a person wearing infrared 
markers, such as raising an arm or turning one’s head. The 
demonstrator presses a sequence of keys on a keyboard (each key 
represents a label such as ‘move’, ‘arm’, ‘left’, etc.), atthe same time as 
performing the corresponding gesture. Using arecurrent, associative 
neural network, the doll learns the association between the sequence of 
keystrokes and how they map onto its actions and perceptions of 
different parts of its body. After training, the demonstrator can press a 
new sequence of keys without performing the corresponding gesture, 
and the robot performs it. 


robot to perform novel actions simply by observinga 
demonstration (See Fig. 1). Second, imitation can be 
a mechanism for communicating (between a robot 
anda human or between two robots). Shared 
meaning for gestures (Fig. 3) or a lexicon (Fig. 4) 
have been accomplished by learning to map shared 
sensory-motor experiences between two different 
bodies (robot to human, or robot torobot). ‘Learning to 
imitate’ frames the motor learning problem as one of 
acquiring a mapping between a perceived behavior 
andthe underlying movement primitives. By 
representing perceptual-motor primitives as 
predictive forward models, both the observation and 
the output of the primitivesharethesamecoordinate 
representation, someasuring similarity is 
computationally efficient. A solution tothe 
correspondence problem is not given totherobot in 
‘learning by imitation’. Instead, the learner acquires 
a state-action policy by following the model and 
thereby sharing a similar perceptual and motor state 
[8,9,27]. This mapping often represents a shared 
inter-personal communication protocol, where the 
model announces thelabels for particular 


Questions for future research 


J ustas children develop the ability to imitate the goal of an action rather than a 
specific act, can we construct robots that are capable of making this inference? 
Today's robots respond only to the observable behavior without any 
understanding of the intent of an action. 

Who should the robot learn from, and when is imitative learning appropriate? 
Robots that imitate humans today are programmed to imitate any human 


within view. 


Can robots capitalize on the two-way communication of social interactions to 
enhance learning? What capabilities would be gained if the robot could 
interrupt an instructional session to ask questions, or when the instructor 
notices that the robot is performing an action incorrectly? 
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sensory-motor states as they occur and the follower 
learns their association. 

Third, imitation has been an effectivetool for 
efficient motor learningin high-dimensional spaces. 
For a humanoid robot with many articulated joints, 
thestate-action space becomes prohibitively large to 
search for a viable solution in reasonabletime. The 
issueof learning efficiency has been addressed both 
by building more compact state-action spaces using 
movement primitives (Box 4) (inspired by their 
biological counterpart [40]), and by constraining the 
search through state- action space by using a human 
demonstration of the skill as an example[3]. 
Alternatively, a predictive forward model can be 
learned fromthe human demonstration, and used as 
simulated experienceto acceleratetrial and error 
learning [7]. 

Imitation and other forms of social learning hold 
tremendous promiseas a powerful means for robots 
(humanoid and otherwise) toacquirenew tasks and 
skills. Unfortunately, the most advanced robots we 
have currently are less adept than 2-year-old children 
at imitating theactions and goals of people. This 
review focused on twofundamental issues (what to 
imitate and how toimitate) that are far from solved, 
but therearemany other important research areas 
that need to be addressed (see Questions for future 
research). It is our belief that research on these issues 
in artificial systems will both benefit from, and 
inform, research on imitation in biological systems. 
Thesynthetic approach of building systems that 
imitate requires attention to details that are often 
not part of theanalytic study of social behavior in 
animals. F or example, the process of selecting which 
object toimitateis not often addressed in literatureon 
animal social learning but is a critical part of any 
roboticimplementation. Further, we believethat 
imitating robots offer uniquetools to evaluateand 
explore models of animal (and human) behavior. 

J ust as simulations of neural networks have been 
useful in evaluating the applicability of models 

of neural function, these robots can serveas a 
test-bed for evaluating models of human and animal 
social learning. 

Imitation is a sophisticated form of socially 
mediated learning. Todate, however, robots that 
learn by imitation-inspired mechanisms arenot 
particularly social themselves. I n the examples 
above, the interaction isin one direction, from 
demonstrator (or model) tolearner, rather than there 
being a bi-directional exchange of information. In 
human infants, imitation is hypothesized to play an 
important early rolein the development of social 
cognition, serving as a discovery procedure for 
understanding persons, and providing the earliest 
‘likeme’ experiences of the self in relation to others 
[2]. Beyond ease of programming and skill transfer 
from human to robot, imitation could one day play a 
rolein understanding the social cognition of robots as 
they begin toco-exist with people. 
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